{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "view-in-colab"
      },
      "source": [
        "<a href=\"https://colab.research.google.com/github/pathwaycom/pathway/blob/main/examples/notebooks/tutorials/windowby_manual.ipynb\" target=\"_parent\"><img src=\"https://pathway.com/assets/colab-badge.svg\" alt=\"Run In Colab\" class=\"inline\"/></a>"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "notebook-instructions",
      "source": [
        "# Installing Pathway with Python 3.10+\n",
        "\n",
        "In the cell below, we install Pathway into a Python 3.10+ Linux runtime.\n",
        "\n",
        "> **If you are running in Google Colab, please run the colab notebook (Ctrl+F9)**, disregarding the 'not authored by Google' warning.\n",
        "> \n",
        "> **The installation and loading time is less than 1 minute**.\n"
      ],
      "metadata": {}
    },
    {
      "cell_type": "code",
      "id": "pip-installation-pathway",
      "source": [
        "%%capture --no-display\n",
        "!pip install --prefer-binary pathway"
      ],
      "execution_count": null,
      "outputs": [],
      "metadata": {}
    },
    {
      "cell_type": "code",
      "id": "logging",
      "source": [
        "import logging\n",
        "\n",
        "logging.basicConfig(level=logging.CRITICAL)"
      ],
      "execution_count": null,
      "outputs": [],
      "metadata": {}
    },
    {
      "cell_type": "markdown",
      "id": "4",
      "metadata": {},
      "source": [
        "# Windowby - Reduce\n",
        "In this manu\\[a\\]l, you will learn how to aggregate data with the windowby-reduce scheme.\n",
        "\n",
        "Pathway offers powerful features for time series data manipulation. One such feature is the `windowby` function, which allows for intricate data segmentation based on specified criteria.\n",
        "\n",
        "The `windowby` function can operate in three distinct modes\u2014session, sliding, and tumbling\u2014which are determined by the type of windowing function you pass to it.\n",
        "* Session Window: Groups adjacent elements based on activity and inactivity periods.\n",
        "* Sliding Window: Groups elements in overlapping windows of a specified length.\n",
        "* Tumbling Window: Groups elements in non-overlapping windows of a specified length.\n",
        "\n",
        "\n",
        "\n",
        "![Illustration of Window types](https://pathway.comassets/content/documentation/table-operations/windowby-types.png)\n\n",
        "\n",
        "This guide focuses on exploring these different types, demonstrating how each one can be used to achieve unique and important data analysis tasks.\n",
        "\n",
        "The data we're going to use is about... drumroll please... chocolate consumption! Let's suppose we have a dataset that tracks the amount of chocolate eaten during the day by a group of chocoholics. So, without further ado, let's get started."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "5",
      "metadata": {},
      "outputs": [],
      "source": [
        "import pathway as pw"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "6",
      "metadata": {
        "lines_to_next_cell": 2
      },
      "outputs": [],
      "source": [
        "# To use advanced features with Pathway Scale, get your free license key from\n",
        "# https://pathway.com/features and paste it below.\n",
        "# To use Pathway Community, comment out the line below.\n",
        "pw.set_license_key(\"demo-license-key-with-telemetry\")"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "7",
      "metadata": {},
      "outputs": [],
      "source": [
        "fmt = \"%Y-%m-%dT%H:%M:%S\"\n",
        "\n",
        "table = pw.debug.table_from_markdown(\n",
        "    \"\"\"\n",
        "    | time                  | name            | chocolate_bars\n",
        " 0  | 2023-06-22T09:12:34   | Fudge_McChoc    | 2\n",
        " 1  | 2023-06-22T09:23:56   | Ganache_Gobbler | 2\n",
        " 2  | 2023-06-22T09:45:20   | Truffle_Muncher | 1\n",
        " 3  | 2023-06-22T09:06:30   | Fudge_McChoc    | 1\n",
        " 4  | 2023-06-22T10:11:42   | Ganache_Gobbler | 2\n",
        " 5  | 2023-06-22T10:32:55   | Truffle_Muncher | 2\n",
        " 6  | 2023-06-22T11:07:18   | Fudge_McChoc    | 3\n",
        " 7  | 2023-06-22T11:23:12   | Ganache_Gobbler | 1\n",
        " 8  | 2023-06-22T11:49:29   | Truffle_Muncher | 2\n",
        " 9  | 2023-06-22T12:03:37   | Fudge_McChoc    | 4\n",
        " 10 | 2023-06-22T12:21:05   | Ganache_Gobbler | 3\n",
        " 11 | 2023-06-22T13:38:44   | Truffle_Muncher | 3\n",
        " 12 | 2023-06-22T14:04:12   | Fudge_McChoc    | 1\n",
        " 13 | 2023-06-22T15:26:39   | Ganache_Gobbler | 4\n",
        " 14 | 2023-06-22T15:55:00   | Truffle_Muncher | 1\n",
        " 15 | 2023-06-22T16:18:24   | Fudge_McChoc    | 2\n",
        " 16 | 2023-06-22T16:32:50   | Ganache_Gobbler | 1\n",
        " 17 | 2023-06-22T17:58:06   | Truffle_Muncher | 2\n",
        "\"\"\"\n",
        ").with_columns(time=pw.this.time.dt.strptime(fmt))"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "8",
      "metadata": {},
      "source": [
        "## Temporal Session Windowing\n",
        "The `session` windowing function is designed for grouping together adjacent time events based on a specific condition. This can either be a maximum time difference between events or a custom condition defined by you.\n",
        "\n",
        "For instance, let's say you are curious about the binge-eating sessions of the chocoholics. You'd want to group all consecutive records where the gap between the chocolate eating times is less than or equal to some period of time.\n",
        "\n",
        "Let's check out an example:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "9",
      "metadata": {},
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "name            | session_start       | session_end         | chocolate_bars\n",
            "Fudge_McChoc    | 2023-06-22 09:06:30 | 2023-06-22 12:03:37 | 10\n",
            "Fudge_McChoc    | 2023-06-22 14:04:12 | 2023-06-22 14:04:12 | 1\n",
            "Fudge_McChoc    | 2023-06-22 16:18:24 | 2023-06-22 16:18:24 | 2\n",
            "Ganache_Gobbler | 2023-06-22 09:23:56 | 2023-06-22 12:21:05 | 8\n",
            "Ganache_Gobbler | 2023-06-22 15:26:39 | 2023-06-22 16:32:50 | 5\n",
            "Truffle_Muncher | 2023-06-22 09:45:20 | 2023-06-22 13:38:44 | 8\n",
            "Truffle_Muncher | 2023-06-22 15:55:00 | 2023-06-22 15:55:00 | 1\n",
            "Truffle_Muncher | 2023-06-22 17:58:06 | 2023-06-22 17:58:06 | 2\n"
          ]
        }
      ],
      "source": [
        "from datetime import timedelta\n",
        "\n",
        "result = table.windowby(\n",
        "    table.time,\n",
        "    window=pw.temporal.session(max_gap=timedelta(hours=2)),\n",
        "    instance=table.name,\n",
        ").reduce(\n",
        "    pw.this.name,\n",
        "    session_start=pw.this._pw_window_start,\n",
        "    session_end=pw.this._pw_window_end,\n",
        "    chocolate_bars=pw.reducers.sum(pw.this.chocolate_bars),\n",
        ")\n",
        "\n",
        "# Print the result\n",
        "pw.debug.compute_and_print(result, include_id=False)"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "10",
      "metadata": {},
      "source": [
        "## Temporal Sliding Windowing\n",
        "\n",
        "Next, let's slide into sliding windows. Sliding windows move through your data at a specific step (hop) and create a window of a specific duration. This is like sliding a magnifying glass over your data to focus on specific chunks at a time.\n",
        "\n",
        "Let's find the chocolate consumption within sliding windows of duration 10 hours, sliding every 3 hours. This could be handy for identifying peak chocolate-eating times!"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "11",
      "metadata": {},
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "name            | window_start        | window_end          | chocolate_bars\n",
            "Fudge_McChoc    | 2023-06-22 00:00:00 | 2023-06-22 10:00:00 | 3\n",
            "Fudge_McChoc    | 2023-06-22 03:00:00 | 2023-06-22 13:00:00 | 10\n",
            "Fudge_McChoc    | 2023-06-22 06:00:00 | 2023-06-22 16:00:00 | 11\n",
            "Fudge_McChoc    | 2023-06-22 09:00:00 | 2023-06-22 19:00:00 | 13\n",
            "Fudge_McChoc    | 2023-06-22 12:00:00 | 2023-06-22 22:00:00 | 7\n",
            "Fudge_McChoc    | 2023-06-22 15:00:00 | 2023-06-23 01:00:00 | 2\n",
            "Ganache_Gobbler | 2023-06-22 00:00:00 | 2023-06-22 10:00:00 | 2\n",
            "Ganache_Gobbler | 2023-06-22 03:00:00 | 2023-06-22 13:00:00 | 8\n",
            "Ganache_Gobbler | 2023-06-22 06:00:00 | 2023-06-22 16:00:00 | 12\n",
            "Ganache_Gobbler | 2023-06-22 09:00:00 | 2023-06-22 19:00:00 | 13\n",
            "Ganache_Gobbler | 2023-06-22 12:00:00 | 2023-06-22 22:00:00 | 8\n",
            "Ganache_Gobbler | 2023-06-22 15:00:00 | 2023-06-23 01:00:00 | 5\n",
            "Truffle_Muncher | 2023-06-22 00:00:00 | 2023-06-22 10:00:00 | 1\n",
            "Truffle_Muncher | 2023-06-22 03:00:00 | 2023-06-22 13:00:00 | 5\n",
            "Truffle_Muncher | 2023-06-22 06:00:00 | 2023-06-22 16:00:00 | 9\n",
            "Truffle_Muncher | 2023-06-22 09:00:00 | 2023-06-22 19:00:00 | 11\n",
            "Truffle_Muncher | 2023-06-22 12:00:00 | 2023-06-22 22:00:00 | 6\n",
            "Truffle_Muncher | 2023-06-22 15:00:00 | 2023-06-23 01:00:00 | 3\n"
          ]
        }
      ],
      "source": [
        "result = table.windowby(\n",
        "    table.time,\n",
        "    window=pw.temporal.sliding(duration=timedelta(hours=10), hop=timedelta(hours=3)),\n",
        "    instance=table.name,\n",
        ").reduce(\n",
        "    name=pw.this._pw_instance,\n",
        "    window_start=pw.this._pw_window_start,\n",
        "    window_end=pw.this._pw_window_end,\n",
        "    chocolate_bars=pw.reducers.sum(pw.this.chocolate_bars),\n",
        ")\n",
        "\n",
        "# Print the result\n",
        "pw.debug.compute_and_print(result, include_id=False)\n"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "12",
      "metadata": {},
      "source": [
        "This gives you detailed insights about chocolate consumption over different time windows."
      ]
    },
    {
      "cell_type": "markdown",
      "id": "13",
      "metadata": {},
      "source": [
        "## Temporal Tumbling Windowing\n",
        "\n",
        "Finally, let's tumble through tumbling windows. Tumbling windows divide our data into distinct, non-overlapping intervals of a given length.\n",
        "\n",
        "Let's divide the time series into tumbling windows of 5 hours each to see how our chocolate consumption varies over distinct periods."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "14",
      "metadata": {},
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "name            | window_start        | window_end          | chocolate_bars\n",
            "Fudge_McChoc    | 2023-06-22 09:00:00 | 2023-06-22 14:00:00 | 10\n",
            "Fudge_McChoc    | 2023-06-22 14:00:00 | 2023-06-22 19:00:00 | 3\n",
            "Ganache_Gobbler | 2023-06-22 09:00:00 | 2023-06-22 14:00:00 | 8\n",
            "Ganache_Gobbler | 2023-06-22 14:00:00 | 2023-06-22 19:00:00 | 5\n",
            "Truffle_Muncher | 2023-06-22 09:00:00 | 2023-06-22 14:00:00 | 8\n",
            "Truffle_Muncher | 2023-06-22 14:00:00 | 2023-06-22 19:00:00 | 3\n"
          ]
        }
      ],
      "source": [
        "result = table.windowby(\n",
        "    table.time,\n",
        "    window=pw.temporal.tumbling(duration=timedelta(hours=5)),\n",
        "    instance=table.name,\n",
        ").reduce(\n",
        "    name=pw.this._pw_instance,\n",
        "    window_start=pw.this._pw_window_start,\n",
        "    window_end=pw.this._pw_window_end,\n",
        "    chocolate_bars=pw.reducers.sum(pw.this.chocolate_bars),\n",
        ")\n",
        "\n",
        "# Print the result\n",
        "pw.debug.compute_and_print(result, include_id=False)"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "15",
      "metadata": {},
      "source": [
        "## Conclusion\n",
        "\n",
        "In this guide, you've mastered the use of the windowby-reduce scheme in the Pathway library, a robust tool for time-series data aggregation. The three types of window functions\u2014session, sliding, and tumbling\u2014have been unveiled, each with its unique way of segmenting data. A playful example of chocolate consumption illuminated their practical application. As you continue to delve into data analysis, check out the tutorial [Detecting suspicious user activity with Tumbling Window group-by](/developers/templates/suspicious_activity_tumbling_window), which utilizes the tumbling window function to spot unusual user behavior. Continue exploring, and elevate your data analysis prowess."
      ]
    }
  ],
  "metadata": {
    "jupytext": {
      "cell_metadata_filter": "-all",
      "main_language": "python",
      "notebook_metadata_filter": "-all"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.11.11"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 5
}